Skip to content

[epoll] Unix.select conversion: replace with stdext modules/calls or polly calls #5705

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

edwintorok
Copy link
Contributor

Builds on top of #5704

@edwintorok
Copy link
Contributor Author

The previous PR for this was #4877, we should check whether all the review comments from that PR got addressed.

@edwintorok edwintorok changed the title Unix.select conversion: replace with stdext modules/calls or polly calls [epoll] Unix.select conversion: replace with stdext modules/calls or polly calls Jun 19, 2024
@edwintorok
Copy link
Contributor Author

edwintorok commented Jul 2, 2024

I've rebased this on top of latest feature perf + master that includes the new clock module: https://github.com/edwintorok/xen-api/tree/private/edvint/epoll3-pr-updated.
I'll look at addressing the other review comments next, and then force push the branch (I'll probably also need to update the base to include latest feature/perf once a few more PRs to feature/perf get merged).

@edwintorok edwintorok changed the base branch from feature/epoll-next to feature/epoll-next2 July 2, 2024 17:24
@edwintorok edwintorok force-pushed the private/edvint/epoll3-pr branch from aa057ea to 7f36fcd Compare July 2, 2024 17:25
@coveralls
Copy link

Coverage Status

coverage: 44.887%. remained the same
when pulling 7f36fcd on edwintorok:private/edvint/epoll3-pr
into 2d3bae4 on xapi-project:feature/epoll-next2.

@edwintorok
Copy link
Contributor Author

This PR depends on #5766, which in turn depends on the quicktest bugfix on master (I've reverted the commit in this branch, to be able to test it, but I should eventually drop that revert once we got the proper fix).

@edwintorok edwintorok marked this pull request as draft July 2, 2024 17:34
@edwintorok
Copy link
Contributor Author

There were a lot of changes, rebases here to keep up with the changes on master and I found one rebase error so far:

wait remaining_time;
wait remaining_time

I tried modifying the tests to detect this doubling, but it doesn't yet.

Putting this back to draft, also we'll need to carefully review the code, it should be simpler, but there were a lot of rebases/fixups and might've completely broken it (I haven't looked at the diffs yet, just at unit test results, and running a BVT now).

@edwintorok edwintorok force-pushed the private/edvint/epoll3-pr branch from 7f36fcd to 1b9dbdc Compare July 3, 2024 17:50
@coveralls
Copy link

Coverage Status

coverage: 44.887%. remained the same
when pulling 1b9dbdc on edwintorok:private/edvint/epoll3-pr
into 2d3bae4 on xapi-project:feature/epoll-next2.

@edwintorok edwintorok force-pushed the private/edvint/epoll3-pr branch from 1b9dbdc to f23e3f9 Compare July 3, 2024 18:04
@edwintorok
Copy link
Contributor Author

Found the bug with HA: when removing the Timeout module and replacing it with the Clock.Timer module and spans there was no delay function, so I implemented one, but got the conversion wrong, and gave Unix.sleepf and argument that was in nanoseconds instead of seconds, so XAPI ended up waiting for a very long time before connecting to block_device_io, which gave up, raised a timeout error and exited with code 2.

I've now added a test for the delay function, and used the Clock.Timer.span_to_s to perform the conversion.

@coveralls
Copy link

Coverage Status

coverage: 44.887%. remained the same
when pulling f23e3f9 on edwintorok:private/edvint/epoll3-pr
into 2d3bae4 on xapi-project:feature/epoll-next2.

@lindig
Copy link
Contributor

lindig commented Jul 4, 2024

Found the bug with HA: when removing the Timeout module and replacing it with the Clock.Timer module and spans there was no delay function, so I implemented one, but got the conversion wrong, and gave Unix.sleepf and argument that was in nanoseconds instead of seconds, so XAPI ended up waiting for a very long time before connecting to block_device_io, which gave up, raised a timeout error and exited with code 2.

I've now added a test for the delay function, and used the Clock.Timer.span_to_s to perform the conversion.

This perfectly demonstrates the problem of having the various units for time spans and why we have to minimise this and probably need to encode the expected unit in a label for arguments.

@edwintorok
Copy link
Contributor Author

This perfectly demonstrates the problem of having the various units for time spans and why we have to minimise this and probably need to encode the expected unit in a label for arguments.

I've attempted to hide these conversions inside the Unixext module, so code in XAPI shouldn't need to interact with this directly anymore:

  • for epoll the conversion from Mtime.Span.t to milliseconds is handled internally in Unixext, XAPI passes a type safe Mtime.Span.t
  • for Unix.sleepf again the call is hidden in Unixext, exposing only a type-safe Mtime.Span.t

floats can be ambigous. The usual convention in the Unix module is that floats represent seconds, and that is why I keep making this mistake.
But Mtime doesn't follow that convention and returns a float the represents nanoseconds. To be fair the name of the function is very explicit that it returns nanoseconds.

In general Mtime.Span.t is what we should use, and we should hide the conversions to other units needed by syscalls/library calls in small functions/modules that are tested.
Testing is the part that I was missing (I think it was tested before, but I dropped those tests when removing the Timeout module), but I've added it back now.

@edwintorok
Copy link
Contributor Author

edwintorok commented Jul 8, 2024

I think this PR is "ready", but I'm waiting for feature/perf to be merged to master before undrafting it as that will require some updates to the target branch of the PR and a rebase (the CI now fails due systemd being dropped on master).

@edwintorok
Copy link
Contributor Author

Apparently there is a problem with the timeout tests now, will investigate tomorrow:

Timed out earlier than requested: 187μs < 1ms
       Raised at QCheck2.Test.fail_report in file "src/core/QCheck2.ml", line 1611, characters 22-41
       Called from Dune__exe__Unixext_test.test_time_limited_write.(fun) in file "ocaml/libs/xapi-stdext/lib/xapi-stdext-unix/test/unixext_test.ml", line 87, characters 12-131
Error: Exception: qcheck: user fail:
Timed out earlier than requested: 98.5μs < 1ms
       Raised at QCheck2.Test.fail_report in file "src/core/QCheck2.ml", line 1611, characters 22-41
       Called from Dune__exe__Unixext_test.test_time_limited_write.(fun) in file "ocaml/libs/xapi-stdext/lib/xapi-stdext-unix/test/unixext_test.ml", line 87, characters 12-131
Error: Exception: qcheck: user fail:
Timed out earlier than requested: 109μs < 1ms
       Raised at QCheck2.Test.fail_report in file "src/core/QCheck2.ml", line 1611, characters 22-41
       Called from Dune__exe__Unixext_test.test_time_limited_write.(fun) in file "ocaml/libs/xapi-stdext/lib/xapi-stdext-unix/test/unixext_test.ml", line 87, characters 12-131
[✗]   44    0    1   22 /  100     0.4s Dune__exe__Unixext_test.test_time_limited_write

@edwintorok edwintorok force-pushed the private/edvint/epoll3-pr branch from 531714b to 9f5a4ef Compare July 9, 2024 08:39
@edwintorok edwintorok changed the base branch from feature/epoll-next2 to master July 10, 2024 17:03
@edwintorok edwintorok force-pushed the private/edvint/epoll3-pr branch 2 times, most recently from 41287a9 to 9e638c6 Compare July 16, 2024 17:39
@edwintorok
Copy link
Contributor Author

Fixed a few more bugs due to differences between select/epoll with a 0 timeout, and SO_RCVTIMEO with a 0 timeout.
The former times out immediately, the latter never times out (0 = no timeout).

I've added some code to detect calls to setsockopt with a timeout <1e-6 and made it raise invalid_arg to catch these (similar to what I've done with the tests on master). There is one exception: when we finish a function, we "restore" the timeout to 0, so in that case the call is allowed (I inserted the check into with_socket_timeout rather than set_socket_timeout).

There was in fact already a wrapper in the Http module for this, I moved it to unixext so we can reuse it.

@edwintorok edwintorok marked this pull request as ready for review July 18, 2024 10:07
@edwintorok edwintorok changed the base branch from master to feature/perf July 18, 2024 10:07
@psafont
Copy link
Member

psafont commented Jul 18, 2024

The PR includes commits from master, only the latest commits should be reviewed (all authored by Edwin)

@@ -325,16 +325,20 @@ let listen_on sock =
s

let accept_conn s latest_response_time =
let now = Unix.gettimeofday () in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit contains:

TODO: needs some extra testing of block_device_io, which doesn't many suitable tests.

Has this been done in some way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is 1 XenRT test capable of detecting breakages in the redolog, the one that got added for HA vTPM testing.
(all other HA XenRT tests used to pass with a completely broken redolog in the past).

There is a test for epoll on block devices part of the stdext quickcheck tests, although it requires root so it is not run in the CI.
I'll add some code to the XAPI quicktests so we can run some of these fuzzers in Dom0.

I think it is still worthwhile trying to add a separate test for the redolog itself that runs as part of the quicktests (it needs a blockdevice, and setting one up even in loopback mode requires root, so can't be done in the CI).

Test.fail_reportf "Timed out earlier than requested: %f < %f"
elapsed_s timeout ;
let elapsed = !test_elapsed in
if Mtime.Span.compare elapsed timeout < 0 then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if Mtime.Span.compare elapsed timeout < 0 then
if Clock.Timer.span_is_shorter elapsed ~than:timeout then

if not (wait remaining_time) then raise Timeout ;
let bytes_to_read = total_bytes_to_read - !bytes_read in
let bytes =
try Unix.read filedesc buf !bytes_read bytes_to_read
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code seems almost all duplicated, only the function called changes (write / read), can this be factored out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Xapi-fdcaps has a better factored out implementation (there is a repeat_read and repeat_write, although much of the code of those 2 functions could be shared too, and there is no Timer version, but one could be easily introduced).

Xapi-fdcap is what I use to test unixext, so if we switch unixext over to xapi-fdcaps directly, then it would be testing itself with itself (although the way timeout testing is done is independent of how it is implemented, so the test would still be a useful one).

Unixext could be changed to use xapi-fdcaps, although the type changes will probably ripple through the codebase. It might result in an overall safer code, because we'd know from its type whether a file descriptor is a socket or not for example (and then whether calling setsockopt is a valid call or not).

I'll check whether switching over directly to fdcaps would result in simpler code, if not I'll factor out the code here in stdext.

@edwintorok
Copy link
Contributor Author

edwintorok commented Jul 18, 2024

I should do a separate update from master PR again to make this easier to review. #5853

snwoods and others added 13 commits July 18, 2024 15:42
Use SO_RCVTIMEO socket option instead.
This will cause EAGAIN or EWOULDBLOCK to be returned according to `socket(7)`.

Need to be careful about the distinction between select 0 and SO_RCVTIMEO 0.
select 0 would timeout immediately, but SO_RCVTIMEO would timeout never.

Signed-off-by: Steven Woods <steven.woods@citrix.com>
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
This introduces a dependency between 'ezxenstore' and 'xapi-stdext-threads'.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
…Unix/Lwt

It was a mix of Lwt and Unix code, which means that if the Unix call blocks the entire Lwt code blocks too.
This was only installed by the specfile in a -devel package.

`message-cli tail --follow` can be used to debug the IDL protocol instead.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Its implementation was identical, except for the use of time_limited_read in Threadext,
but the semantics is identical.

Use one well tested implementation instead of duplicating code.

One less function to convert to epoll.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Absolute deadlines can be represented using `Mtime.t`, or `Mtime_clock.counter` and comparing against a deadline.
We hide this as an implementation detail in `Timer.t` and use counters (although an implementation using `Mtime.t` would work just as well).

Relative deadlines are represented using `Mtime.Span.t`.
We wrap it in `Timeout.t` to avoid accidentally mistaking it for a time duration calculated elsewhere.

These helper modules correctly convert from time represented as strings (e.g. in configuration files),
and calculate the remaining time until the deadline.
`Mtime.Span.t` expects `float`s as nanoseconds, which can be quite error prone when converting existing code if the conversion factor is forgotten.

The various `time_limited*` functions will be changed to take a `Timer.t` as argument in a followup commit.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Instead of Unix.gettimeofday.

This is required to start using these functions in Jsonrpclient,
which already uses Mtime.
Mtime should also be more robust against clock changes on the host.

TODO: needs some extra testing of block_device_io, which doesn't many suitable tests.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Signed-off-by: Steven Woods <steven.woods@citrix.com>
Note: Unixext.proxy is more efficient and doesn't recreate the epoll instance many times.
But xsh is not performance critical.

Signed-off-by: Steven Woods <steven.woods@citrix.com>
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
…t_timed_read

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
…Unix.select

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
If we want to reproduce a failure we need to know the exact commandline that was used.
Longer than 80 chars is not a problem, this is a logfile, and a truncated line is worse than a long line.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
@edwintorok edwintorok force-pushed the private/edvint/epoll3-pr branch from 9e638c6 to ef13b44 Compare July 18, 2024 15:04
@edwintorok edwintorok marked this pull request as draft July 18, 2024 15:23
@edwintorok
Copy link
Contributor Author

I'm going to take a different approach with epoll, see master...edwintorok:xen-api:private/edvint/epoll4.
That should be a much smaller change (still needs some tests) + a mechanical search/replace on the codebase, and then we can do the other improvements from this branch on top of it (I didn't want to do it that way originally because introduces a performance problem, but we've got the rest of the commits now that avoid it, it just taking too long review them).

@edwintorok
Copy link
Contributor Author

I've split this PR:

PRs that should go to master (to avoid further conflicts on the epoll branch, and don't yet convert to epoll):
#5861
#5862
#5863

Alternative approach started here (untested so far):
#5864

We can then do the Timer and xapi-fdcaps conversion on top.

@edwintorok edwintorok closed this Jul 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants